On Combining Feature Selection and Over-Sampling Techniques for Breast Cancer Prediction
نویسندگان
چکیده
Breast cancer prediction datasets are usually class imbalanced, where the number of data samples in malignant and benign patient classes significantly different. Over-sampling techniques can be used to re-balance construct more effective models. Moreover, some related studies have considered feature selection remove irrelevant features from for further performance improvement. However, since order combining over-sampling result different training sets model, it is unknown which performs better. In this paper, information gain (IG) genetic algorithm (GA) methods synthetic minority technique (SMOTE) combinations. The experimental results based on two breast show that combination outperform single usage either highly imbalanced datasets. particular, performing IG first SMOTE second better choice. For other with a small imbalance ratio smaller features, enough an model.
منابع مشابه
Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets
Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...
متن کاملFeature Selection Techniques on Thyroid, Hepatitis, and Breast Cancer Datasets
Last century, the challenge was to develop new technologies that store large amount of data. Recently, the challenges are to effectively utilize the incredible amount of data and to obtain knowledge that benefits business, scientific, and government transactions by using subset of features rather than the whole features in the dataset. In the present paper, we have focused on feature selection ...
متن کاملfeature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets
objective(s): this study addresses feature selection for breast cancer diagnosis. the present process uses a wrapper approach using ga-based on feature selection and ps-classifier. the results of experiment show that the proposed model is comparable to the other models on wisconsin breast cancer datasets. materials and methods: to evaluate effectiveness of proposed feature selection method, we ...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2021
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app11146574